7

Appendices

Appendix A: XML Quick Reference

This section provides a review of XML features and conventions for quick reference.

The parts of an XML document

An XML document consists of the following parts, in this order:

1. An XML declaration (optional, but highly recommended)

2. A DOCTYPE declaration and DTD (optional), including comments, processing instructions, and entity references

3. XML elements (and their attributes), comments, processing instructions, and entity references

XML declaration

The XML declaration, if included, must be the first line in a XML document. It indicates the version of XML that the document adheres to, and whether the file includes any references to other files. For example:

<?xml version="1.0" standalone="no"?>

DOCTYPE declaration (including DTD)

The DOCTYPE declaration which specifies the document's DTD goes after the XML declaration and before the opening tag of the root element. There are two potential parts to any DTD: The external subset and the internal subset. If a document has only an external subset, it looks like this:

<?xml version="1.0" standalone="no">
<!DOCTYPE rootElement SYSTEM "URL of DTD">
<rootElement>Content goes here.</rootElement>

If a document has only an internal subset, it looks like this:

<?xml version="1.0" standalone="yes">
<!DOCTYPE rootELement [

]>
<rootElement>Content goes here.</rootElement>

If a document has both an external subset and an internal subset, it looks like this:

<?xml version="1.0" standalone="no">

<!DOCTYPE rootElement SYSTEM "URL of DTD" [

]>
<rootElement>Content goes here.</rootElement>

Elements

An element consists of an opening tag (<tagName>), some content, and a closing tag (</tagName>):

<tagName>Content goes here.</tagName>

An exception is the empty tag, which may be a single tag with a forward slash before the closing >:

All elements must be properly nested, meaning that the most recently opened tag must be closed before you can close any other tags. For example, the following line would be illegal in an XML document because it does not close <tag2> before closing <tag1>:

<tag1><tag2>Content goes here.</tag1></tag2>

Each XML document must have a root element that contains all the other elements in the document.

Element names are case-sensitive. Each element name must begin with a letter or an underscore (_); subsequent characters in the name may be letters, underscores, numbers, hyphens and periods, but not spaces or tabs.

Attributes

Elements may have attributes as part of their opening tag (or, for empty elements, as part of the single opening/closing tag). An attribute consists of an attribute name followed by an equals sign and then an attribute value in quotation marks. For example:

<elementName attributeName="attributeValue">Content</elementName>
<elementName attributeName="attributeValue"/>

Comments

A comment consists of text between a . The content of comments should be ignored by XML processors. Comments may not contain "--" and they may not contain other comments.

Processing instructions

A processing instruction consists of text between a <? and a ?>. Processing instructions are read only by XML processors and may not contain content. The syntax for processing instructions is as follows:

<?target instruction?>

Character references

A character reference is way of representing Unicode characters in parsed character data. The syntax for character references is as follows:

&#UnicodeValueOfCharacter;

Entity references

An entity reference is a name that represents a specific character, text string, or file. Entity references in an XML document are always between an ampersand (&) and semicolon (;). For example, > represents a greater than sign (<), which may not be included in XML content except as an entity reference.

The meaning of each entity reference used in an XML document must be defined in the document's DTD, with the exception of the following predefined character entity references, which may be used without being defined:

Character	Entity reference
<	<
>	>
&	&
"	"
'	'

Well-formed XML

To be well-formed, an XML document must follow these rules:

The first line should be an XML declaration
There must be an end tag for every start tag (except for single empty tags)
Single empty tags must end with />
There must be a root element that contains all other elements
All elements must be properly nested, meaning that the most recently opened tag must be closed before you can close any other tags
All attribute values must be in quotation marks ("")
All tags must begin with < and all entities must begin with &
The only entity references that may be used unless the document has a DTD are the predefined character entity references listed above

Valid XML

A valid XML document is an XML document that is well-formed and adheres to the DTD specified by its DOCTYPE declaration.

Appendix B: DTD Quick Reference

This section provides a review of DTD features and conventions for quick reference.

The parts of a DTD

A DTD may be composed of the following parts, in no particular order:

Element type declarations
Attribute declarations
Comments
Entity reference declarations
Notation declarations
Processing instructions
Parsed entity references
Conditional sections

Element type declarations

The syntax for an element type definition is as follows:

<!ELEMENT elementName (elementContent)>

Element content may consist of parsed character data (that is, text and entity references, expressed as ) and/or other element types. The following symbols may be inserted after any element name or closing parenthesis in the element content definition:

Symbol	Meaning
None	Exactly one
+	One or more
*	Zero or more
?	Zero or one

To require one element to be followed by another, use a comma:

<!ELEMENT elementName (element1, element2)>

To indicate that content can include one element or another, use a |:

<!ELEMENT elementName (element1 | element2)>

To allow an element to contain a combination of specific elements and #PCDATA in any order, use the following syntax:

<!ELEMENT elementName (#PCDATA | element1 | element2)*>

To allow an element to contain any combination of elements and #PCDATA in any order, use the following syntax (note omission of parentheses):

<!ELEMENT elementName ANY>

To define an empty element, use the following syntax (note omission of parentheses):

<!ELEMENT elementName EMPTY>

Attribute declarations

The syntax for a single attribute definition is as follows:

<!ATTLIST elementName attributeName attributeType defaultValue>

Attribute names are case-sensitive. Each attribute name must begin with a letter or an underscore (_); subsequent characters in the name may be letters, underscores, numbers, hyphens and periods, but not spaces or tabs.

Attribute types may be as follows:

Attribute type	Meaning
CDATA	Character data and entity references, between quotation marks ("")
ID	Must contain a unique name* for each element of this type
IDREF	The unique ID name* of an element in the XML file
ENTITY	An unparsed external entity reference name* defined in the DTD
ENTITIES	A list of ENTITY names, separated by spaces
Enumerated	A list of names*, separated by \| characters, in parentheses
NMTOKEN	A value containing only NameChar characters**
NMTOKENS	A list of NMTOKENs, separated by spaces
NOTATION	The name of a notation defined in the DTD
Enumerated NOTATION	A list of NOTATIONs, separated by \| characters, in parentheses

*Names must begin with a letter or an underscore (_); subsequent characters in the name may be letters, underscores, numbers, hyphens, and periods, but not spaces or tabs.

**NameChar characters include letters, underscores, numbers, hyphens, or periods, but not spaces or tabs.

Default attribute values may be as follows:

Attribute type	Meaning
#REQUIRED	This attribute must be specified by the element
#IMPLIED	This attribute may or may not be used
#FIXED value	If not specified, this attribute is assumed to be value; if specified, it must be value
defaultValue	If not specified, this attribute is assumed to be defaultValue

Comments

A comment consists of text between a . The content of comments should be ignored by XML processors. Comments may not contain "--" and they may not contain other comments.

Character references

A character reference is way of representing Unicode characters in parsed character data. The syntax for character references is as follows:

&#UnicodeValueOfCharacter;

Entity reference declarations

There are five types of entities. The syntax for their declaration is as follows:

Type	Syntax
Parsed internal	<!ENTITY entityName "text of entity">
Parsed external	<!ENTITY entityName SYSTEM "URL of file"> OR ~<!ENTITY entityName PUBLIC "name of file" "URL of file">
Unparsed external	<!ENTITY entityName SYSTEM "URL of file" NDATA notationName> OR ~<!ENTITY entityName PUBLIC "name of file" "URL of file" NDATA notationName>
Internal parameter	<!ENTITY % entityName "text of entity">
External parameter	<!ENTITY % entityName SYSTEM "URL of file"> OR ~<!ENTITY % entityName PUBLIC "name of file" "URL of file">

The syntax for using the first three types of entity reference is &entityName;. The syntax for using a parameter entity is %entityName;. Parameter entity references are always parsed and may be used only in a DTD.

Notation declarations

Notation declarations should be specified in one of the two following ways:

<!NOTATION notationName SYSTEM "External Identifier">
<!NOTATION notationName PUBLIC "External Identifier Name" "Backup URL">

The external identifier should be the name of an application that can process or display files to which this notation is applied. For example:

<!NOTATION gif SYSTEM "Microsoft Internet Explorer">

Note that it is up to the application that processes the XML to pass the URL to the application indicated by the external identifier.

Processing instructions

<?target instruction?>

Appendix C: Understanding Encodings

Let's say you've just exported an XML file from avenue.quark, and when you go to look at it in your text editor, you see a lower-case "a" with an accent wherever you thought you had a trademark symbol. In fact, a lot of your special symbols look wrong. What happened?

More than likely, your text editor doesn't support the encoding used by your XML file. This section explains the topic in detail.

What is an encoding?

An encoding is specification that maps a set of characters to corresponding numeric values. For example, the ASCII encoding maps the character "M" to the numeric value 77, "N" to 78, "O" to 79, and so forth.

A text file's encoding allows a program to translate the text file into the proper characters on the screen. Without the encoding, a text file is just a stream of numbers. If you view a text file using the wrong encoding, you're likely to see garbage, because the application opening the file will map the numeric values to the wrong set of characters.

All of the following are encodings:

ASCII
MacRoman (used by Mac OS)
Windows Latin 1 (used by Windows)
UTF-8
UTF-16 (Unicode)
Shift-JIS

Avenue.quark supports the UTF-8, UTF-16, and Shift-JIS encodings.

Lower and upper character ranges

You can divide most encodings into two parts: the first 128 characters (the lower range), and all of the characters after that (the upper range).

Generally speaking, the lower range of most encodings is mapped to the same characters. This range includes the characters a-z, A-Z, 0-9, a handful of punctuation characters, plus some special control characters.

It's when you get into the upper range that you run into trouble. For example, MacRoman and Windows Latin 1 have lower ranges that are nearly identical. So if you take a file that uses only characters from this range and transfer that file from Mac OS to Windows, it looks fine. But if the file contains upper-range characters, you might get some strange results, because many of the upper-range values are mapped to different characters on each platform. For example, a character that shows up as a trademark symbol in Mac OS might show up as a superscript lower-case A in Windows.

When you get such incorrect character displays, it's either because the application displaying the text doesn't know the encoding of that text, or because the application isn't capable of correctly displaying text with the file's specified encoding.

Specifying encodings

You can indicate the encoding of an XML file by including an encoding specification in the file's XML declaration, like so:

<?xml version="1.0" standalone="yes" encoding="Shift_JIS"?>

If an XML file doesn't contain an encoding specification, avenue.quark assumes that the file uses the UTF-8 encoding.

When you save an XML file from avenue.quark, you specify the document's encoding using the Encoding pop-up menu, and avenue.quark automatically generates the appropriate encoding attribute.

Encodings and DTDs

XML lets you specify the encoding of an XML file. However, it doesn't provide a way to specify the encoding of a free-standing DTD file.

Fortunately, avenue.quark does. To specify the encoding of a free-standing DTD, just add the following text as the first line in the file:

<?xml encoding="encodingName" ?>

For example, to specify a free-standing DTD as a UTF-16 DTD, just add the following line to the beginning of the file:

<?xml encoding="UTF-16" ?>